The effect size estimates ranged from -3.80 to 14.07 (\(M = 0.21, SD = 1.73\)). Five effect sizes exceeded 3 standard deviations from the mean effect size; consider removing these as outliers. Several studies reported multiple effect sizes (1 - 30, with most reporting 1 effect size).
Meta-analysis was conducted in R (R Core Team 2021) using the R-packages metafor (Viechtbauer et al. 2010), and pema (Van Erp S. 2021).
To estimate overall effects, we used three-level meta-analysis to account for dependent effect sizes within studies (Van den Noortgate et al. 2015).
Let \(y_{jk}\) denote the \(j\) observed effect sizes \(y\), originating from \(k\) studies.
The multi-level model is then given by the following equations:
\[ \left. \begin{aligned} y_{jk} &= \beta_{jk} + \epsilon_{jk} &\text{where } \epsilon_{jk} &\sim N(0, \sigma^2_{\epsilon_{jk}})\\ \beta_{jk} &= \theta_k + w_{jk} &\text{where } w_{jk} &\sim N(0, \sigma^2_{w})\\ \theta_{k} &= \delta + b_{k} &\text{where } b_k &\sim N(0, \sigma^2_{b}) \end{aligned} \right\} \]
The first equation indicates that observed effect sizes are equal to the underlying population effect size, plus sampling error \(\epsilon_{jk}\). The second equation indicates that population effect sizes within studies are a function of a study-specific true effect size, plus within-study residuals \(w_{jk}\). The third equation indicates that the distribution of study-specific true effect sizes are distributed around an overall mean effect, with between-study residuals \(b_k\).
Separate meta-analyses were conducted for each of the samples. The overall pooled effect sizes were:
The overall effect size estimate differed significantly from zero for CBDhumanexperimental, CBDanimalconditioned, AM404conditioned, URB597conditioned.
The within-studies variance component \(\sigma^2_w\) (between effect sizes) was significant for CBDhumanexperimental, CBDanimalconditioned, CBDanimalunconditioned, AM404unconditioned, URB597conditioned, URB597unconditioned.
The between-studies variance \(\sigma^2_b\) was significant for CBDanimalunconditioned, AM404unconditioned, URB597unconditioned, PF3485unconditioned.
This indicates that there was substantial heterogeneity between average effect sizes, both within studies and across studies, in most of the samples.
The forest plots for the aforementioned three-level meta-analyses are presented below. Within each plot, studies are ranked by their sampling variance \(vi\); thus, the most precise estimates are at the bottom, near the overall effect.
Figure 0.1: Forest plot for CBDhumanexperimental
Figure 0.2: Forest plot for CBDanimalconditioned
Figure 0.3: Forest plot for CBDanimalunconditioned
Figure 0.4: Forest plot for AM404conditioned
Figure 0.5: Forest plot for AM404unconditioned
Figure 0.6: Forest plot for URB597conditioned
Figure 0.7: Forest plot for URB597unconditioned
Figure 0.8: Forest plot for PF3485unconditioned
The effect of multiple moderators was investigated using meta-regression.
For two continuous variables, dose and HED, a quadratic term was computed to examine the non-linear (U-shaped) effect.
For categorical variables, dummies were encoded.
Note that the resulting moderator matrix had 25 columns.
As this exceeded in many cases the number of available effect sizes (per sample),
these models were not identified.
Addressing this problem requires performing variable selection.
Three steps were taken to do so.
First, variables and categories that did not occur within one subset of the data were omitted.
Secondly, some dummy variables were redundant because some studies had identical values on multiple dummy variables.
Only one of these redundant dummy variables was retained, and its name was updated to reflect all redundant dummies it represents.
For example, all of the studies in the category “Both” of the variable “sex” used the “public speaking test,” and no other sex category used this test.
These two variables are therefore identical, and their effects cannot be distinguished.
Thus, the analysis shows their joint effect as an effect of sexBoth.anxiety_testspeaking.
Thirdly, despite these measures, many meta-regression models dropped all or some of the predictors,
or failed to converge entirely, suggesting the models were empirically non-identified.
Although these models are reported below,
we advise against their substantive interpretation.
The problems with meta-regression suggests that a technique is required that performs variable selection during analysis.
Such a technique was recently developed: Bayesian penalized meta-regression (BRMA), as implemented in the pema R-package (Van Erp S. (2021)).
By imposing a regularizing (horseshoe) prior on the regression coefficients,
BRMA shrinks all coefficients towards zero, which aids empirical model identification.
Coefficients must overwhelm the prior in order to become significantly different from zero.
Thus, this method also performs variable selection: identifying which moderators are important in predicting the effect size.
The resulting regression coefficients are negatively biased by design, but the estimate of residual heterogeneity \(\tau^2\) is unbiased.
Note that, as this is a Bayesian model, inference is based on credible intervals.
A credible interval is interpreted as follows: The population value falls within this interval with 95% probability (certainty).
This is different from the interpretation of frequentist confidence intervals, which are interpreted as follows: In the long run, 95% of confidence intervals contain the population value.
As the models contained several categorical variables but no meaningful reference category, we estimated models without an intercept, and instead included an equal number of dummies to the number of unique categories (ANOVA specification instead of regression specification). To further aid interpretability, the dependent variable (effect size) was centered around the three-level multilevel meta-analysis estimated overall effect. This means that the regression slope for each dummy variable reflects the deviation of that category from the overall effect size as estimated in the three-level multilevel meta-analysis. If a dummy variable has a significant effect, that means that that group’s mean differs significantly from the overall mean. Note that in penalized regression, predictors are usually standardized. However, the effect of standardized dummies cannot be meaningfully interpreted. Therefore, only continuous predictors were standardized in this analysis. This may give dummy variables a slight advantage, leading them to become significant sooner than continuous ones.
Note that analyses containing VIF values greater than 5 should be regarded as problematic, due to multicolinearity. This applies to nearly all models.
Model for CBDhumanexperimental did not converge.
Model for AM404conditioned did not converge.
Based on three-level multilevel meta-analyses, there is limited evidence that overall effects are non-zero in the population, except for the samples “Acq retr to ctx” and “Ext retr to ctx.” All samples showed significant between-studies variance, except “Ext retr to cue.” Conversely, none of the samples showed significant within-studies variance, except “Acq retr to ctx.” There is thus substantial evidence that heterogeneity in effect sizes is mostly due to between-studies differences.
Classic meta-regression analyses were largely invalid for moderator analysis, because of high multicolinearity among predictors. BRMA analyses were used, which are robust to multicolinearity, and perform variable selection by shrinking regression coefficients towards zero. These BRMA analyses revealed no consistent evidence of any significant moderator effect across samples.